{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Lesson 3: Automating Model-Graded Evals"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "import warnings\n",
    "warnings.filterwarnings('ignore')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Import the API keys for our 3rd party APIs."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from utils import get_circle_api_key\n",
    "cci_api_key = get_circle_api_key()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from utils import get_gh_api_key\n",
    "gh_api_key = get_gh_api_key()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from utils import get_openai_api_key\n",
    "openai_api_key = get_openai_api_key()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up our github branch"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "from utils import get_repo_name\n",
    "course_repo = get_repo_name()\n",
    "course_repo"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "from utils import get_branch\n",
    "course_branch = get_branch()\n",
    "course_branch"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## The sample application: AI-powered quiz generator\n",
    "\n",
    "Here is our sample application from the previous lesson that you will continue working on.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "!cat app.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## A first model graded eval\n",
    "Build a prompt that tells the LLM to evaluate the output of the quizzes."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "delimiter = \"####\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 81
   },
   "outputs": [],
   "source": [
    "eval_system_prompt = f\"\"\"You are an assistant that evaluates \\\n",
    "  whether or not an assistant is producing valid quizzes.\n",
    "  The assistant should be producing output in the \\\n",
    "  format of Question N:{delimiter} <question N>?\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Simulate LLM response to make a first test."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 132
   },
   "outputs": [],
   "source": [
    "llm_response = \"\"\"\n",
    "Question 1:#### What is the largest telescope in space called and what material is its mirror made of?\n",
    "\n",
    "Question 2:#### True or False: Water slows down the speed of light.\n",
    "\n",
    "Question 3:#### What did Marie and Pierre Curie discover in Paris?\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Build the prompt for the evaluation (eval)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 285
   },
   "outputs": [],
   "source": [
    "eval_user_message = f\"\"\"You are evaluating a generated quiz \\\n",
    "based on the context that the assistant uses to create the quiz.\n",
    "  Here is the data:\n",
    "    [BEGIN DATA]\n",
    "    ************\n",
    "    [Response]: {llm_response}\n",
    "    ************\n",
    "    [END DATA]\n",
    "\n",
    "Read the response carefully and determine if it looks like \\\n",
    "a quiz or test. Do not evaluate if the information is correct\n",
    "only evaluate if the data is in the expected format.\n",
    "\n",
    "Output Y if the response is a quiz, \\\n",
    "output N if the response does not look like a quiz.\n",
    "\"\"\""
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Use langchain to build the prompt template for evaluation."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 98
   },
   "outputs": [],
   "source": [
    "from langchain.prompts import ChatPromptTemplate\n",
    "eval_prompt = ChatPromptTemplate.from_messages([\n",
    "      (\"system\", eval_system_prompt),\n",
    "      (\"human\", eval_user_message),\n",
    "  ])"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Choose an LLM."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 64
   },
   "outputs": [],
   "source": [
    "from langchain.chat_models import ChatOpenAI\n",
    "llm = ChatOpenAI(model=\"gpt-3.5-turbo\",\n",
    "                 temperature=0)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "From langchain import a parser to have a readable response."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "from langchain.schema.output_parser import StrOutputParser\n",
    "output_parser = StrOutputParser()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Connect all pieces together in the variable 'chain'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "eval_chain = eval_prompt | llm | output_parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Test the 'good LLM' with positive response by invoking the eval_chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "eval_chain.invoke({})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create function 'create_eval_chain'."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 489
   },
   "outputs": [],
   "source": [
    "def create_eval_chain(\n",
    "    agent_response,\n",
    "    llm = ChatOpenAI(model=\"gpt-3.5-turbo\", temperature=0),\n",
    "    output_parser=StrOutputParser()\n",
    "):\n",
    "  delimiter = \"####\"\n",
    "  eval_system_prompt = f\"\"\"You are an assistant that evaluates whether or not an assistant is producing valid quizzes.\n",
    "  The assistant should be producing output in the format of Question N:{delimiter} <question N>?\"\"\"\n",
    "  \n",
    "  eval_user_message = f\"\"\"You are evaluating a generated quiz based on the context that the assistant uses to create the quiz.\n",
    "  Here is the data:\n",
    "    [BEGIN DATA]\n",
    "    ************\n",
    "    [Response]: {agent_response}\n",
    "    ************\n",
    "    [END DATA]\n",
    "\n",
    "Read the response carefully and determine if it looks like a quiz or test. Do not evaluate if the information is correct\n",
    "only evaluate if the data is in the expected format.\n",
    "\n",
    "Output Y if the response is a quiz, output N if the response does not look like a quiz.\n",
    "\"\"\"\n",
    "  eval_prompt = ChatPromptTemplate.from_messages([\n",
    "      (\"system\", eval_system_prompt),\n",
    "      (\"human\", eval_user_message),\n",
    "  ])\n",
    "\n",
    "  return eval_prompt | llm | output_parser"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Create new response to test in the eval_chain."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "known_bad_result = \"There are lots of interesting facts. Tell me more about what you'd like to know\""
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "bad_eval_chain = create_eval_chain(known_bad_result)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# response for wrong prompt\n",
    "bad_eval_chain.invoke({})"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Add new create_eval_chain into the 'test_assistant.py' file."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": [
    "!cat test_assistant.py"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 47
   },
   "outputs": [],
   "source": [
    "# Command to see the content of the file\n",
    "!cat test_release_evals.py"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "**_Note:_** if you want to inspect the config run `!head circle_config.yml`\n",
    "\n",
    "Command: !cat circle_config.yml\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Push new files into CircleCI's Git repo."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 149
   },
   "outputs": [],
   "source": [
    "from utils import push_files\n",
    "push_files(course_repo, \n",
    "           course_branch, \n",
    "           [\"app.py\",\n",
    "            \"test_release_evals.py\",\n",
    "            \"test_assistant.py\"],\n",
    "           config=\"circle_config.yml\"\n",
    "          )"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Trigger the Release Evaluations."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 132
   },
   "outputs": [],
   "source": [
    "from utils import trigger_release_evals\n",
    "trigger_release_evals(course_repo, \n",
    "                      course_branch, \n",
    "                      [\"app.py\",\n",
    "                       \"test_assistant.py\",\n",
    "                       \"test_release_evals.py\"],\n",
    "                      cci_api_key)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {
    "height": 30
   },
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.9.18"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
